| Key Words |
|---|
| Word Complexity |
| Lyric Length |
| Sentiment Score |
With 2 clusters, we have the variance of 0.226, which represents a pretty weak model. The following graphs are the clusters with and without color, using sentiment score and word complexity as variables.
Seems like 3 clusters is the best choice.
Seems like 3 is the best choice.
Here we have the graph for 3 clusters using the same variables, and the variance for this model is 0.313, which is better than the variance of 2-clusters-model.
The graph below shows the relationship among the three variables in a clustering model. We can see a relationship between word complexity and song length, which makes intuitive sense to us since longer songs may have more complex lyrics. However, the majorities songs are in the very bottom cluster, so we will zoom in there and select songs with less than 0.2 (after normalization) length to have a closer look.
Now, it’s clear to see that the each category of songs does tend to cluster together, meaning that songs in the same category have similar features. We can see that the lyrics of rap songs are actually more complicated than other categories, and the length of which is also generally higher than other songs’ lyrics. However, rap songs tend to have lower sentiment scores while R&B and Country songs can have really high sentiment scores. Rock songs vary in all three measures, and the complexity of rock lyrics is either very high or lower than average. Overall, raps songs are generally longer, more complex, and have lower sentiment scores. Rock and Country songs vary in the three measures. Country and R&B are similar, but R&B songs are a little longer and more complex. Prediction based on this 3 clustering model has a variance of 0.313.